A Generic Inverted Index Framework for Similarity Search on the GPU - Technical Report

نویسندگان

  • Jingbo Zhou
  • Qi Guo
  • H. V. Jagadish
  • Lubovs Krvc'al
  • Siyuan Liu
  • Wenhao Luan
  • Anthony K. H. Tung
  • Yueji Yang
  • Yuxin Zheng
چکیده

Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general parallelization strategy to speed up the search. In this paper, we propose a generic inverted index on the GPU (called GENIE), which can support similarity search of multiple queries on various data types. GENIE can effectively support the approximate nearest neighbor search in different similarity measures through exerting Locality Sensitive Hashing schemes, as well as similarity search on original data such as short document data and relational data. Extensive experiments on different real-life datasets demonstrate the efficiency and effectiveness of our system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generic Inverted Index on the GPU

Data variety, as one of the three Vs of the Big Data, is man-ifested by a growing number of complex data types such asdocuments, sequences, trees, graphs and high dimensionalvectors. To perform similarity search on these data, exist-ing works mainly choose to create customized indexes fordifferent data types. Due to the diversity of customized in-dexes, it is hard to dev...

متن کامل

Technical Report: Parallel Distance Threshold Query Processing for Spatiotemporal Trajectory Databases on the GPU

Processing moving object trajectories arises in many application domains and has been addressed by practitioners in the spatiotemporal database and Geographical Information System communities. In this work, we focus on a trajectory similarity search, the distance threshold query, which finds all trajectories within a given distance d of a search trajectory over a time interval. We demonstrate t...

متن کامل

Technical Report: Towards Efficient Indexing of Spatiotemporal Trajectories on the GPU for Distance Threshold Similarity Searches

Applications in many domains require processing moving object trajectories. In this work, we focus on a trajectory similarity search that finds all trajectories within a given distance of a query trajectory over a time interval, which we call the distance threshold similarity search. We develop three indexing strategies with spatial, temporal and spatiotemporal selectivity for the GPU that diff...

متن کامل

Accelerated BLAST Performance with Tera-BLASTTM: a comparison of FPGA versus GPU and CPU BLAST implementations

A number of technologies have emerged for accelerating similarity search algorithms in bioinformatics, including the use of field programmable gate arrays (FPGA), graphics processing units (GPU), and clusters of standard multicore CPUs. Here we present Tera-BLASTTM, an FPGA-accelerated implementation of the BLAST algorithm, and compare the performance to GPU-accelerated BLAST and the industry s...

متن کامل

Scaling Out All Pairs Similarity Search with MapReduce

Given a collection of objects, the All Pairs Similarity Search problem involves discovering all those pairs of objects whose similarity is above a certain threshold. In this paper we focus on document collections which are characterized by a sparseness that allows effective pruning strategies. Our contribution is a new parallel algorithm within the MapReduce framework. The proposed algorithm is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016